AITopics | second descent

Collaborating Authors

second descent

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

aec5e2847c5ae90f939ab786774856cc-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 13:37:43 GMT

artificial intelligence, inductive learning, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
North America > Mexico > Yucatán > Mérida (0.04)
Asia > Pakistan (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.93)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Add feedback

A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning

Neural Information Processing SystemsDec-26-2025, 13:46:19 GMT

Conventional statistical wisdom established a well-understood relationship between model complexity and prediction error, typically presented as a _U-shaped curve_ reflecting a transition between under-and overfitting regimes. However, motivated by the success of overparametrized neural networks, recent influential work has suggested this theory to be generally incomplete, introducing an additional regime that exhibits a second descent in test error as the parameter count $p$ grows past sample size $n$ -- a phenomenon dubbed _double descent_. While most attention has naturally been given to the deep-learning setting, double descent was shown to emerge more generally across non-neural models: known cases include _linear regression, trees, and boosting_. In this work, we take a closer look at the evidence surrounding these more classical statistical machine learning methods and challenge the claim that observed cases of double descent truly extend the limits of a traditional U-shaped complexity-generalization curve therein. We show that once careful consideration is given to _what is being plotted_ on the x-axes of their double descent plots, it becomes apparent that there are implicitly multiple, distinct complexity axes along which the parameter count grows. We demonstrate that the second descent appears exactly (and _only_) when and where the transition between these underlying axes occurs, and that its location is thus _not_ inherently tied to the interpolation threshold $p=n$. We then gain further insight by adopting a classical nonparametric statistics perspective. We interpret the investigated methods as _smoothers_ and propose a generalized measure for the _effective_ number of parameters they use _on unseen examples_, using which we find that their apparent double descent curves do indeed fold back into more traditional convex shapes -- providing a resolution to the ostensible tension between double descent and traditional statistical intuition.

double descent, name change, rethinking parameter, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.38)

Add feedback

Understanding Overparametrization in Survival Models through Interpolation

Liu, Yin, Cai, Jianwen, Li, Didong

arXiv.org Machine LearningDec-19-2025

Classical statistical learning theory predicts a U-shaped relationship between test loss and model capacity, driven by the bias-variance trade-off. Recent advances in modern machine learning have revealed a more complex pattern, \textit{double-descent}, in which test loss, after peaking near the interpolation threshold, decreases again as model capacity continues to grow. While this behavior has been extensively analyzed in regression and classification, its manifestation in survival analysis remains unexplored. This study investigates overparametrization in four representative survival models: DeepSurv, PC-Hazard, Nnet-Survival, and N-MTLR. We rigorously define \textit{interpolation} and \textit{finite-norm interpolation}, two key characteristics of loss-based models to understand \textit{double-descent}. We then show the existence (or absence) of \textit{(finite-norm) interpolation} of all four models. Our findings clarify how likelihood-based losses and model implementation jointly determine the feasibility of \textit{interpolation} and show that overparametrization should not be regarded as benign for survival models. All theoretical results are supported by numerical experiments that highlight the distinct generalization behaviors of survival models.

finite-norm interpolation, interpolation, logit, (16 more...)

arXiv.org Machine Learning

2512.12463

Country:

North America > United States > North Carolina (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

On The Presence of Double-Descent in Deep Reinforcement Learning

Veselý, Viktor, Todorov, Aleksandar, Sabatelli, Matthia

arXiv.org Machine LearningNov-11-2025

The double descent (DD) paradox, where over-parameterized models see generalization improve past the interpolation point, remains largely unexplored in the non-stationary domain of Deep Reinforcement Learning (DRL). We present preliminary evidence that DD exists in model-free DRL, investigating it systematically across varying model capacity using the Actor-Critic framework. We rely on an information-theoretic metric, Policy Entropy, to measure policy uncertainty throughout training. Preliminary results show a clear epoch-wise DD curve; the policy's entrance into the second descent region correlates with a sustained, significant reduction in Policy Entropy. This entropic decay suggests that over-parameterization acts as an implicit regularizer, guiding the policy towards robust, flatter minima in the loss landscape. These findings establish DD as a factor in DRL and provide an information-based mechanism for designing agents that are more general, transferable, and robust.

artificial intelligence, deep reinforcement learning, machine learning, (15 more...)

arXiv.org Machine Learning

2511.06895

Genre: Research Report > New Finding (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

aec5e2847c5ae90f939ab786774856cc-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 04:49:10 GMT

artificial intelligence, inductive learning, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
North America > Mexico > Yucatán > Mérida (0.04)
Asia > Pakistan (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.93)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Add feedback

Double Descent as a Lens for Sample Efficiency in Autoregressive vs. Discrete Diffusion Models

Fraij, Ahmad, Dauncey, Sam

arXiv.org Artificial IntelligenceSep-30-2025

Data scarcity drives the need for more sample-efficient large language models. In this work, we use the double descent phenomenon to holistically compare the sample efficiency of discrete diffusion and autoregressive models. We show that discrete diffusion models require larger capacity and more training epochs to escape their underparameterized regime and reach the interpolation threshold. In the strongly overparameterized regime, both models exhibit similar behavior, with neither exhibiting a pronounced second descent in test loss across a large range of model sizes. Overall, our results indicate that autoregressive models are more sample-efficient on small-scale datasets, while discrete diffusion models only become competitive when given sufficient capacity and compute.

diffusion model, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.24974

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback

Reviews: On the number of variables to use in principal component regression

Neural Information Processing SystemsJan-27-2025, 17:33:49 GMT

In the paper, the authors discussed PCR, a well-know variant of regression models, and showed the existence of a "double descent" phenomenon. The paper is technically sound and relatively well-written. I check most of the math and they are correct and reasonable to follow. I do have some concern that too much of the space is taken by the algebra which could make it difficult for readers to grasp the high-level intuition, specifically if they do not have enough time to plough through the equations. Considering the space limit for a NeurIPS submission, I think it's better to reorganize some of the proofs to the appendix, and add a discussion/conclusion session to highlight more about the intuitions.

descent, double descent, principal component regression, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.37)

Add feedback

A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning

Neural Information Processing SystemsJan-19-2025, 19:06:37 GMT

double descent, rethinking parameter, statistical learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Add feedback

A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning

Curth, Alicia, Jeffares, Alan, van der Schaar, Mihaela

arXiv.org Machine LearningOct-29-2023

Conventional statistical wisdom established a well-understood relationship between model complexity and prediction error, typically presented as a U-shaped curve reflecting a transition between under-and overfitting regimes. However, motivated by the success of overparametrized neural networks, recent influential work has suggested this theory to be generally incomplete, introducing an additional regime that exhibits a second descent in test error as the parameter count p grows past sample size n - a phenomenon dubbed double descent. While most attention has naturally been given to the deep-learning setting, double descent was shown to emerge more generally across non-neural models: known cases include linear regression, trees, and boosting. In this work, we take a closer look at the evidence surrounding these more classical statistical machine learning methods and challenge the claim that observed cases of double descent truly extend the limits of a traditional U-shaped complexity-generalization curve therein. We show that once careful consideration is given to what is being plotted on the x-axes of their double descent plots, it becomes apparent that there are implicitly multiple, distinct complexity axes along which the parameter count grows. We demonstrate that the second descent appears exactly (and only) when and where the transition between these underlying axes occurs, and that its location is thus not inherently tied to the interpolation threshold p=n. We then gain further insight by adopting a classical nonparametric statistics perspective. We interpret the investigated methods as smoothers and propose a generalized measure for the effective number of parameters they use on unseen examples, using which we find that their apparent double descent curves do indeed fold back into more traditional convex shapes - providing a resolution to the ostensible tension between double descent and traditional statistical intuition.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

2310.18988

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > Mexico (0.14)

Genre: Research Report (1.00)

Industry:

Materials > Chemicals > Industrial Gases > Liquified Gas (0.46)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.46)
Energy > Oil & Gas > Midstream (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

VC Theoretical Explanation of Double Descent

Lee, Eng Hock, Cherkassky, Vladimir

arXiv.org Artificial IntelligenceSep-29-2022

There has been growing interest in generalization performance of large multilayer neural networks that can be trained to achieve zero training error, while generalizing well on test data. This regime is known as'second descent' and it appears to contradict the conventional view that optimal model complexity should reflect an optimal balance between underfitting and overfitting, i.e., the bias-variance trade-off. This paper presents a VC-theoretical analysis of double descent and shows that it can be fully explained by classical VC-generalization bounds. We illustrate an application of analytic VC-bounds for modeling double descent for classification, using empirical results for several learning methods, such as SVM, Least Squares, and Multilayer Perceptron classifiers. In addition, we discuss several reasons for the misinterpretation of VC-theoretical results in Deep Learning community. There have been many recent successful applications of Deep Learning (DL). However, at present, various DL methods are driven mainly by heuristic improvements, while theoretical and conceptual understanding of this technology remains limited. For example, large neural networks can be trained to fit available data (achieving zero training error) and still achieve good generalization for test data. This contradicts the conventional statistical wisdom that overfitting leads to poor generalization. This phenomenon has been systematically described by Belkin et al. (2019) who introduced the term'double descent' and pointed out the difference between the classical regime (first descent) and the modern one (second descent).

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2205.15549

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback